Discriminative Method for Japanese Kana-Kanji Input Method

نویسندگان

  • Hiroyuki Tokunaga
  • Daisuke Okanohara
  • Shinsuke Mori
چکیده

The most popular type of input method in Japan is kana-kanji conversion, conversion from a string of kana to a mixed kanjikana string. However there is no study using discriminative methods like structured SVMs for kana-kanji conversion. One of the reasons is that learning a discriminative model from a large data set is often intractable. However, due to progress of recent researches, large scale learning of discriminative models become feasible in these days. In the present paper, we investigate whether discriminative methods such as structured SVMs can improve the accuracy of kana-kanji conversion. To the best of our knowledge, this is the first study comparing a generative model and a discriminative model for kana-kanji conversion. An experiment revealed that a discriminative method can improve the performance by approximately 3%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large Scale Collocation Data and Their Application to Japanese Word Processor Technology

Word processors or computers used in Japan employ Japanese input method through keyboard stroke combined with Kana (phonetic) character to Kanji (ideographic, Chinese) character conversion technology. The key factor of Kana-to-Kanji conversion technology is how to raise the accuracy of the conversion through the homophone processing, since we have so many homophonic Kanjis. In this paper, we re...

متن کامل

Keyboards for inputting Japanese language-arxiv

The most commonly used Japanese alphabets are Kanji, Hiragana and Katakana. The Kanji alphabet includes pictographs or ideographic characters that were adopted from the Chinese alphabet. Hiragana and Katakana are phonetic alphabets that do not include any characters common to each other or to Kanji. Hiragana is used to spell words of Japanese origin, while Katakana is used to spell words of wes...

متن کامل

Candidate Display Styles in Japanese Input

Typing Japanese into computers consists of typing Roman alphabet, displaying the kana character, converting kana to kanji, and selecting the intended kanji character from a list of homophonic candidates. This paper presents a study of four candidate display styles, three commonly used in commercial products (“vertical,” “horizontal,” and “compact-horizontal”) and one novel (“matrix”), together ...

متن کامل

Recent Topics in Speech Recognition Research at NTT Laboratories

This paper introduces three recent topics in speech recognition research at NTT (Nippon Telegraph and Telephone) Human Interface Laboratories. The first topic is a new HMM (hidden Markov model) technique that uses VQ-code bigrams to constrain the output probability distribution of the model according to the VQ-codes of previons frames. The output probability distribution changes depending on th...

متن کامل

Kana-Kanji Conversion System with Input Support Based on Prediction

1 I n t r o d u c t i o n TOSHIBA developed the world's first Japanese word processor in 1978. Unlike languages based on an alphabet , Japanese uses /,housands of Ica nji characters of varying comp]exity. Hence, l,o arrange all of l~a'~:ii chm'acl;ers on keyboard is; difficult. On the other hand, kana dlaracters which are phonetic scripl,s of Japanese have 83 variations; these can be arranged o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011